Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bysort | first non-missing calculation

    Hello all:

    I am trying to retrieve the earliest or first non-missing anycomp by studyid and the code I am using is clearly not working. I tried to see if egen and cond would work but can't think up the right combination.

    bysort studyid: gen basecomp = anycomp[_n] if !missing(anycomp)

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int studyid byte sercytopt int cytodate2 float anycomp
     1 1 17282 .
     1 2 17527 .
     1 3 17533 .
     1 4 17706 .
     1 5 19837 .
     1 6 20788 1
     1 7 21039 .
     1 8 21109 .
     1 9 21186 1
     2 1 19045 .
     3 1 20621 1
     3 2 20760 1
     3 3 21843 1
     3 4 21931 1
     4 1 20529 .
     4 2 20529 .
     5 1 20836 .
     5 2 20836 .
     6 1 17153 .
     6 2 19428 .
     6 3 19446 1
     6 4 19512 1
     6 5 20391 1
     6 6 20906 .
     6 7 20977 .
     6 8 21216 1
     7 1 20996 .
     7 2 22530 1
     8 1 20831 .
     8 2 20831 .
     9 1 20423 .
     9 2 20913 .
     9 3 21010 .
     9 4 21012 1
    10 1 20571 .
    10 2 22238 .
    11 1 20915 .
    11 2 21055 1
    12 1 16068 .
    12 2 20023 .
    12 3 20102 .
    12 4 20167 .
    12 5 20255 .
    12 6 20432 .
    12 7 20520 .
    13 1 20083 .
    13 2 20383 .
    14 1 20110 .
    14 2 20240 .
    14 3 20436 .
    14 4 20754 1
    15 1 17932 .
    15 2 17981 1
    15 3 18242 .
    15 4 19753 .
    15 5 20151 1
    16 1 20124 .
    16 2 20125 .
    16 3 20237 .
    16 4 20240 1
    17 1 19382 .
    17 2 19502 1
    17 3 20180 .
    18 1 16329 .
    18 2 18653 .
    18 3 19404 .
    18 4 19442 .
    18 5 19613 .
    18 6 19712 .
    18 7 19935 .
    18 8 20139 .
    19 1 19822 .
    19 2 20136 .
    19 3 21103 .
    20 1 20121 .
    20 2 20121 .
    21 1 20143 .
    22 1 20200 .
    22 2 20200 .
    23 1 19471 .
    23 2 19485 .
    23 3 19836 .
    23 4 20016 1
    23 5 20096 1
    23 6 20230 1
    23 7 20235 1
    23 8 20293 .
    23 9 20334 .
    24 1 17555 .
    24 2 18014 .
    24 3 19472 .
    25 1 17219 .
    25 2 20341 .
    25 3 20370 .
    25 4 21537 .
    26 1 18156 .
    26 2 19190 1
    26 3 19646 .
    26 4 20314 1
    26 5 20375 1
    end
    format %td cytodate2

  • #2
    There is a first() function for egen in egenmore on SSC, but it's as or more instructive to work it out from principles. Here are two ways to do it.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int studyid byte sercytopt int cytodate2 float anycomp
     1 1 17282 .
     1 2 17527 .
     1 3 17533 .
     1 4 17706 .
     1 5 19837 .
     1 6 20788 1
     1 7 21039 .
     1 8 21109 .
     1 9 21186 1
     2 1 19045 .
     3 1 20621 1
     3 2 20760 1
     3 3 21843 1
     3 4 21931 1
     4 1 20529 .
     4 2 20529 .
     5 1 20836 .
     5 2 20836 .
     6 1 17153 .
     6 2 19428 .
     6 3 19446 1
     6 4 19512 1
     6 5 20391 1
     6 6 20906 .
     6 7 20977 .
     6 8 21216 1
     7 1 20996 .
     7 2 22530 1
     8 1 20831 .
     8 2 20831 .
     9 1 20423 .
     9 2 20913 .
     9 3 21010 .
     9 4 21012 1
    10 1 20571 .
    10 2 22238 .
    11 1 20915 .
    11 2 21055 1
    12 1 16068 .
    12 2 20023 .
    12 3 20102 .
    12 4 20167 .
    12 5 20255 .
    12 6 20432 .
    12 7 20520 .
    13 1 20083 .
    13 2 20383 .
    14 1 20110 .
    14 2 20240 .
    14 3 20436 .
    14 4 20754 1
    15 1 17932 .
    15 2 17981 1
    15 3 18242 .
    15 4 19753 .
    15 5 20151 1
    16 1 20124 .
    16 2 20125 .
    16 3 20237 .
    16 4 20240 1
    17 1 19382 .
    17 2 19502 1
    17 3 20180 .
    18 1 16329 .
    18 2 18653 .
    18 3 19404 .
    18 4 19442 .
    18 5 19613 .
    18 6 19712 .
    18 7 19935 .
    18 8 20139 .
    19 1 19822 .
    19 2 20136 .
    19 3 21103 .
    20 1 20121 .
    20 2 20121 .
    21 1 20143 .
    22 1 20200 .
    22 2 20200 .
    23 1 19471 .
    23 2 19485 .
    23 3 19836 .
    23 4 20016 1
    23 5 20096 1
    23 6 20230 1
    23 7 20235 1
    23 8 20293 .
    23 9 20334 .
    24 1 17555 .
    24 2 18014 .
    24 3 19472 .
    25 1 17219 .
    25 2 20341 .
    25 3 20370 .
    25 4 21537 .
    26 1 18156 .
    26 2 19190 1
    26 3 19646 .
    26 4 20314 1
    26 5 20375 1
    end
    format %td cytodate2
    
    gen ismiss = missing(anycomp)
    bysort studyid (ismiss cytodate2) : gen wanted = anycomp[1]  
    
    by studyid: egen whenfirst = min(cond(!missing(anycomp), cytodate2, .)) 
    by studyid : egen wanted2 = min(cond(cytodate2 == whenfirst, anycomp, .))  
    
    tabdisp studyid, c(whenfirst wanted*)
    
    ----------------------------------------------
      studyid |  whenfirst      wanted     wanted2
    ----------+-----------------------------------
            1 |      20788           1           1
            2 |                                   
            3 |      20621           1           1
            4 |                                   
            5 |                                   
            6 |      19446           1           1
            7 |      22530           1           1
            8 |                                   
            9 |      21012           1           1
           10 |                                   
           11 |      21055           1           1
           12 |                                   
           13 |                                   
           14 |      20754           1           1
           15 |      17981           1           1
           16 |      20240           1           1
           17 |      19502           1           1
           18 |                                   
           19 |                                   
           20 |                                   
           21 |                                   
           22 |                                   
           23 |      20016           1           1
           24 |                                   
           25 |                                   
           26 |      19190           1           1
    ----------------------------------------------
    The code would need to be more elaborate is missing meant any of .a to .z and it was important to know which of those occurred first whenever all values were missing for a given identifier.

    See also

    https://www.stata.com/support/faqs/d...t-occurrences/


    https://journals.sagepub.com/doi/pdf...867X1101100210 Section 9

    Comment


    • #3
      That is very informative to see the way you used cond. The code worked neatly for me. I guess I will have to collapse the data by firstnm wanted2 by studyid to get the desired group of tagged cases. Is there a way I can tag only the first wanted 2 without collapsing the file, Nick Cox

      Comment


      • #4
        If you want only the first non-missing value for each identifier (which isn't explicit in #1) then using my code is a waste of time as collapse (firstnm) exists to do the job directly.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          There is a first() function for egen in egenmore on SSC, but it's as or more instructive to work it out from principles. Here are two ways to do it.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int studyid byte sercytopt int cytodate2 float anycomp
          1 1 17282 .
          1 2 17527 .
          1 3 17533 .
          1 4 17706 .
          1 5 19837 .
          1 6 20788 1
          1 7 21039 .
          1 8 21109 .
          1 9 21186 1
          2 1 19045 .
          3 1 20621 1
          3 2 20760 1
          3 3 21843 1
          3 4 21931 1
          4 1 20529 .
          4 2 20529 .
          5 1 20836 .
          5 2 20836 .
          6 1 17153 .
          6 2 19428 .
          6 3 19446 1
          6 4 19512 1
          6 5 20391 1
          6 6 20906 .
          6 7 20977 .
          6 8 21216 1
          7 1 20996 .
          7 2 22530 1
          8 1 20831 .
          8 2 20831 .
          9 1 20423 .
          9 2 20913 .
          9 3 21010 .
          9 4 21012 1
          10 1 20571 .
          10 2 22238 .
          11 1 20915 .
          11 2 21055 1
          12 1 16068 .
          12 2 20023 .
          12 3 20102 .
          12 4 20167 .
          12 5 20255 .
          12 6 20432 .
          12 7 20520 .
          13 1 20083 .
          13 2 20383 .
          14 1 20110 .
          14 2 20240 .
          14 3 20436 .
          14 4 20754 1
          15 1 17932 .
          15 2 17981 1
          15 3 18242 .
          15 4 19753 .
          15 5 20151 1
          16 1 20124 .
          16 2 20125 .
          16 3 20237 .
          16 4 20240 1
          17 1 19382 .
          17 2 19502 1
          17 3 20180 .
          18 1 16329 .
          18 2 18653 .
          18 3 19404 .
          18 4 19442 .
          18 5 19613 .
          18 6 19712 .
          18 7 19935 .
          18 8 20139 .
          19 1 19822 .
          19 2 20136 .
          19 3 21103 .
          20 1 20121 .
          20 2 20121 .
          21 1 20143 .
          22 1 20200 .
          22 2 20200 .
          23 1 19471 .
          23 2 19485 .
          23 3 19836 .
          23 4 20016 1
          23 5 20096 1
          23 6 20230 1
          23 7 20235 1
          23 8 20293 .
          23 9 20334 .
          24 1 17555 .
          24 2 18014 .
          24 3 19472 .
          25 1 17219 .
          25 2 20341 .
          25 3 20370 .
          25 4 21537 .
          26 1 18156 .
          26 2 19190 1
          26 3 19646 .
          26 4 20314 1
          26 5 20375 1
          end
          format %td cytodate2
          
          gen ismiss = missing(anycomp)
          bysort studyid (ismiss cytodate2) : gen wanted = anycomp[1]
          
          by studyid: egen whenfirst = min(cond(!missing(anycomp), cytodate2, .))
          by studyid : egen wanted2 = min(cond(cytodate2 == whenfirst, anycomp, .))
          
          tabdisp studyid, c(whenfirst wanted*)
          
          ----------------------------------------------
          studyid | whenfirst wanted wanted2
          ----------+-----------------------------------
          1 | 20788 1 1
          2 |
          3 | 20621 1 1
          4 |
          5 |
          6 | 19446 1 1
          7 | 22530 1 1
          8 |
          9 | 21012 1 1
          10 |
          11 | 21055 1 1
          12 |
          13 |
          14 | 20754 1 1
          15 | 17981 1 1
          16 | 20240 1 1
          17 | 19502 1 1
          18 |
          19 |
          20 |
          21 |
          22 |
          23 | 20016 1 1
          24 |
          25 |
          26 | 19190 1 1
          ----------------------------------------------
          The code would need to be more elaborate is missing meant any of .a to .z and it was important to know which of those occurred first whenever all values were missing for a given identifier.

          See also

          https://www.stata.com/support/faqs/d...t-occurrences/


          https://journals.sagepub.com/doi/pdf...867X1101100210 Section 9
          Thanks for the egenmore. This below worked per you suggestion. Learnt one more thing for the day..

          egen testfirst = ifirst(anycomp), v(1) by(studyid)

          Comment

          Working...
          X